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Abstract:  W^consider  ill-posed  problems  of  the  form  g(t)  =J^’  K(t,s)f(s)ds  and  their  discrete 
approximations  obtained  by  quadrature.  Ax  = b.  We  assume  that  our  desired  solution  f is 
smooth  and  that  our  data  g is  measured  experimently  and  contains  highly  oscillatory  noise. 

theorems  and  examples  we^demonstrate  the  effect  of  each  of  these  procedures,  the 
singular  value  decomposition  with  truncation,  (SVDT)  a Hankel  trans£.9rmation  with  damping, 
and  the  Tikhonov  regularization  procedure,  on  such  noise  in  the  data. , jWe  demonstrate!  that  in 
general,  regularization  is  the  most  natural  setting  for  mollifying  the  effects  of  such  noise. 
However,  for  certain  problems  SVDT  is  equally  suitable  and  in  fact  may  be  better  if  the  rate 
of  convergence  of  the  regularization  procedure  is  too  slow. 


• This  Research  was  sponsored  by  the  Air  Force  Office  of  Scientific  Research  (AFSC), 
United  States  Air  Force,  under  contract  F44620-76-C-0022. 
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1.  Introduction:  We  consider  the  following  type  of  ill-posed  problems.  In  the  continous  case 
we  consider  a linear  integral  equation  of  the  first  kind 


g(t) 


Jq 


K(t,s)  f(s)ds 


(1) 


where  K is  an  L2-  kernel  and  we  assume  that  the  null  space  of  K,  N(K),  is  empty.  Then  K 
is  a compact  operator,  and  its  inverse  is  unbounded.  In  the  algebraic  case  we  consider  a 
square  nxn  system  of  linear  equations 


Ax  = b 


(2) 


where  A is  invertible,  but  ill-conditioned.  As  a measure  of  ill-conditioning  we  use  the  ratio 
of  the  largest  to  the  smallest  singular  values.  In  our  discussions  we  will  consider  (2)  as  a 
discrete  approximation  to  (1),  obtained  using  a quadrature  rule.  Some  papers,  for  example, 
Faddeev  and  Faddeeva  [1],  Zhukovskii  and  Morozov  [2],  and  Tikhonov  [3]  consider  the 
regularization  of  general  systems  (2)  not  necessarily  related  to  a continuous  problem  (1). 
However,  we  will  see  that  in  the  context  of  the  current  discussion  it  may  not  be  reasonable  to 
consider  general  systems. 


A problem  is  ill-posed  if  small  changes  in  the  data  - in  problem  (1)  the  data  is  g(t).  in 
problem  2 it  is  b - can  yield  large  changes  in  the  solution. 

Example.  In  (1)  let  K(t,s)  =«  0 t<s 

1 t > s 

Then  solving  (1)  is  equivalent  to  differentiating  g.  If  we  replace  g by  gg=g+e  sin«t  for 
small  £ and  large  u.  we  see  that  for  the  solution  of  (1).  f^  corresponding  to  gg,  the  error  f^ 


I - g'  s £ u cos<at.  is  large. 


Page  2 


r 

r 

I 

i 


r 


L 


The  application  of  standard  numerical  analytic  techniques  to  the  solution  of  (1)  or  (2) 
yields  non  physical,  highly  oscillatory  solutions.  Equations  of  the  form  (1),  arise  in  many 
applications,  for  example  the  numerical  differentiation  of  tabulated  data,  the  deconvolution  of 
data  obtained  in  spectroscopy  experiments,  inverse  problems  in  geophysics,  and  in  signal 
processing,  such  as  radar  and  sonar.  For  examples  of  some  of  these  applications  see  the 
following  references  Bachmann  et  al  [4],  Huang  and  Parrish  [S]  and  Backus  and  Gilbert  [6], 

We  emphasize  the  fact  that  we  want  a solution,  not  just  at  one  point  of  an  interval,  but 
over  the  entire  interval.  Moreover,  we  are  dealing  with  experimentally  determined  data  so  this 
data  almost  surely  contains  errors,  perhaps  both  systematic  and  random.  The  following 
discussion  assumes  that  any  systematic  errors  have  been  removed  from  the  data  supplied  to 
(1).  In  many  cases,  the  random  error  obtained  is  highly  oscillatory  and  of  low  amplitude,  and 
the  desired  measurement  is  smooth.  It  is  this  situation  that  we  want  to  discuss. 

In  recent  years  several  schemes,  see  for  example,  Hanson  [7],  Varah  [8],  Ekstrom  and 
Rhoads  [9],  Tikhonov  [10],  and  Lee  et  al  [11]  have  been  proposed  for  solving  (1)  and  (2). 
We  will  discuss  3 of  these,  the  singular  value  decomposition  with  truncation  (SVDT)  Hanson 
[7],  Varah  [8];  the  Hankel  transformation  procedure  of  Ekstrom  and  Rhoads  [9];  and  the 
regularization  procedure  of  Tikhonov  [10].  We  will  attempt  to  give  some  understanding  of  the 
relationships  between  these  3 procedures  and  of  the  strengths  and  weaknesses  of  each  of  these 
procedures  by  presenting  a few  Theorems  and  examples.  We  will  argue,  using  ideas  from 
Anderssen  and  Bloomfield  [12]  for  numerical  differentiation,  that  of  the  3 procedures,  the 
regularization  procedure  yields  the  most  natural  resolution  of  the  problem  of  handling  noisy 
data.  We  argue  by  example  that  in  general,  the  SVDT  procedure  may  not  mollify  the  effects 
of  the  noise  in  the  data.  The  Ekstrom-Rhoads  procedure  lies  somewhere  between  SVDT  and 
regularization.  This  Hankel  transformation  procedure  is  very  interesting  and  does  mollify  the 
noise.  We  can,  however,  show  by  example  that  it  may  also  mollify  the  desired  solution  which 
is  something  we  do  not  want  to  do.  The  key  to  our  problem  is  an  appropriate  choice  of  basis. 
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see  for  example  Varah  [13].  To  decide  what  basis  is  appropriate  we  must  first  decide  what  the 
objective  of  our  computation  is.  What  would  we  like  to  be  able  to  do? 


First  we  state  two  basic  premises. 

Premises 

1)  We  assume  that  the  desired  solution  is  smooth  with  one  or  more  derivatives.  2)  We 
assume  the  data  contains  highly  oscillatory  additive  noise.  See  Figure  1. 

Given  these  premises  our  objective  is  as  follows. 


Objective 

To  approximate  our  original  problem  by  a better  conditioned  problem  that  ( I ) reduces  the 
influence  of  the  noise,  and  (2)  gives  us  a physically  meaningful  solution  that  approximates  the 
true  solution  in  some  reasonable  sense.  We  will  demonstrate  by  example  in  the  section  on 
SVDT  that  modifying  a problem  so  that  it  is  numerically  stable  does  not  necessarily  guarantee 
that  we  have  simultaneously  mollified  the  significant  part  of  the  noise.  .Clearly,  the  errors  in 
the  data  impose  a limitation  on  the  achievable  accuracy. 


Figure  1 


We  note  in  passing  that  in  certain  situations,  for  example  in  the  registration  system  of  an 
electron  beam  column  Wilson  et  al  [14]  noise  removal  is  achieved  by  simply  running  the  same 
experiment  many  times  and  then  averaging  the  data  obtained  over  the  many  runs.  With  truly 
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random  data  with  zero  mean  averaging  over  a set  of  experiments  will  eliminate  the  noise. 
However,  there  are  many  situations  where  it  is  impractical  or  even  wrong  to  run  the  same 
experiment  many  times.  See  Chang  [IS]  for  example  which  discusses  Auger  spectroscopy.  In 
Auger  analysis  a beam  of  electrons  is  shot  at  a specimen.  Certain  types  of  electrons  subse- 
quently emitted  from  the  specimen  are  collected  and  used  to  determine  the  chemical  compos- 
ition of  the  specimen.  However,  each  time  the  specimen  is  exposed  to  the  beam,  deterioration 
of  the  specimen  occurs  and  in  fact  the  sample  at  the  end  of  an  experiment  may  not  be  identical 
to  what  it  was  in  the  beginning.  Therefore,  noise  averaging  is  not  feasible. 

As  stated  earlier,  in  the  preceeding  context  we  will  discuss  3 methods  SVDT,  Ekstrom  and 
Rhoads,  and  regularization  keeping  the  2 premises  and  our  stated  objectives  in  mind.  We 
conclude  that  regularization  is  the  most  direct  of  the  3 methods  for  handling  any  random  noise 
in  the  data.  We  note,  however,  that  in  certain  situations  SVDT  also  effectively  handles  the 
noise.  We  further  note  that  the  goodness  of  a regularization  approximation  depends  strongly 
upon  the  rate  of  convergence  of  the  approximations,  and  for  some  problems  this  rate  as 
pointed  out  by  Franklin  [16]  can  be  very  slow.  In  such  a situation  a SVDT  may  yield  a better 
result  than  regularization.  Thus,  the  use  of  regularization  is  not  as  straightforward  as  some 
articles  would  indicate.  This  discussion  should  not  be  interpreted  as  a proposal  that  any  of  the 
3 methods  not  be  used.  It  is  only  an  attempt  to  interpret  each  of  these  3 procedures  in  terms 
of  what  they  do  to  any  noise  in  the  given  data.  As  the  discussion  proceeds  we  will  see  that 
many  areas  of  applied  mathematics  come  into  play.  Functional  analysis,  integral  equation 
theory,  quadrature  rules,  numerical  linear  algebra,  Fourier  analysis,  statistics,  and  filtering 
theory  are  all  used. 

2.  The  Singular  Value  Decomposition  with  Truncation  (SVDT). 

Picard's  Theorem,  Smithies  [17],  gives  necessary  ar  1 sufficient  conditions  for  the  existence  of 


a solution  to  (I). 
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Theorem  [17].  Let  K(t,s)  be  an  L2  - kernel  for  (t,s)  in  [0,1]  x [0,1],  and  let  g e L2  [0,1], 
Then  (1)  has  an  L-,  - solution  f if  and  only  if 


n=l 


(g'^u^)^  < » 


and  g is  in  the  closure  of  the  range  of  K,  R(K).  In  (3)  Up,  is  a full  set  of  eigenfunctions  for 
the  operator  K K*  and  is  the  corresponding  set  of  eigenvalues,  + 0 as  n + oe.  K*  is 
the  adjoint  operator  of  K,  and  its  kernel  is  K*(t,s)  = K(s,t). 

Theorem  2 [17].  Under  the  hypotheses  of  Theorem  1,  equation  (1)  has  the  unique  solution  in 
the  orthogonal  complement  of  N(K), 


» 1 

2 Vp 

n=l  a_ 


where  the  Vp  are  the  eigenfunctions  of  the  operator  K*K. 


The  system  [Up],  [Vp],  [«„},  n = 1,2,...  is  the  singular  value  decomposition  of  the 
operator  K,  Smithies  [17]. 

That  is. 


K - 2 


Similarly,  for  the  matrix  A,  we  obtain 


A . USVT 


where  the  columns  of  U and  V are  the  vectors  Uj Up  and  Vj Vp  respectively,  and  2 is 

the  diagonal  matrix  whose  non  zero  entries  are  Oj,  1 < j < n.  The  oj  are  called  the  singular 
values  of  K or  A.  We  note  that  the  closure  of  the  span  of  the  Vp  , n = 1,2,...  is  the  closure 
of  the  range  of  K*  , R(K*),  which  equals  the  orthogonal  complement  of  the  null  space  of  K, 
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N(K)-*-.  Moreover,  the  closure  of  the  span  of  the  u„,  n = 1,2,...  equals  the  closure  of  the 
range  of  K,  R(K),  which  equals  the  orthogonal  coraplement  of  the  null  space  of  K*,  NfK*)-*-. 
Since  we  assume  N(K)  = {0},  we  have  that  the  v„  span  all  of  L2. 

We  note  that  there  is  a generalization  of  the  SVD  due  to  Van  Loan  [18]  for  real  matrices 
of  the  form  A + B.  It  is  called  the  BSVD  and  uses  3 matrices  U,X,  and  V with  U and  V 
orthogonal  matrices  and  X nonsingular,  plus  2 diagonal  matrices  and  Z3.  The  resulting 
decomposition  is 

U^AX  = 2a 

(7) 

V^BX  = Eg 

If  we  denote  the  non  zero  entries  in  2^  by  oj  1 < i < n , and  those  in  Eg  by  1 < i < 
n,  then  the  B - singular  values  of  A 

/i(A,B)  = Ifi  \ ii>0,  det  (aTA  - = 0} 

are  given  by 

m(A.B)  = I 1 < i < n}  . 

We  have  stated  both  the  SVD  and  BSVD  results  for  square  matrices,  but  both  apply  to 
mxn  rectangular  matrices.  The  BSVD  with  truncation  does  not  seem  to  have  been  used  in  the 
literature,  one  exception  is  Varah  [13].  We  will  not  directly  consider  the  BSVD;  however,  the 
comments  that  we  make  about  the  SVD  with  truncation  also  apply  to  a BSVD  with  truncation. 
The  basis  of  interest  in  that  case  is  the  columns  of  X. 

The  SVD  with  truncation  (SVDT)  is  easier  to  discuss  in  the  algebraic  framework  (2),  so 
we  consider  (2).  For  any  real  matrix  A we  have  a SVD 


A - U2VT 


(8) 


p.. — 
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where  U and  V are  orthogonal  matrices,  U an  ortho-normal  set  of  eigenvectors  of  AA^, 
V an  ortho-normal  set  of  eigenvectors  of  A^A  and  2 a diagonal  matrix  whose  entries  are 
the  square  roots  of  the  eigenvalues  of  AA^,  Oj  + as  j + n.  The  columns  in  U and  V are 
ordered  to  match  the  ordering  in  2.  In  our  problems  A is  square  and  nonsingular,  so  none  of 
the  diagonal  entries  in  2 vanish. 

A SVDT  is  achieved  by  considering  the  entries  in  2 and  deciding  which  entries  are  'not 
significant'  Varah  [13],  Lawson  and  Hanson  [19],  'Insignificant'  entries  are  set  equal  to  zero 


obtaining  2 = / 2 , Oyrom 

Vo  oj 

^ 'l 

2 = \^0  22/  . 


0/ 

Then  the  equation 
Ax  = U2VTx=  b 


(9) 


is  solved. 

This  approach  assumes  that  the  projection  of  the  data  onto  the  singular  vectors  corresponding 
to  those  singular  values  reduced  to  zero  is  small.  In  fact  it  assumes  more.  Namely,  that  the 
projection  g^u^  amplified  by  l/a„  is  small,  Varah  [9]  discussed  this  in  detail  and  derives 
some  error  estimates.  We  have  the  following  simple  lemma. 


Lemma  SVDT  is  equivalent  to  projecting  the  given  data  onto  the  space  spanned  by  the 
'significant'  singular  vectors.  U in  U. 

Proof  of  Lemma 

Replace  b by  ¥ = U U^b.  Then  the  solution  ?of  Ax  = b satisfies”  = V 2"’  ^ 


uTb  = v^r'^u^b  = v^r'o^u^b  = xjvdt- 


solution  of  (9). 


O ED 
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At  this  point  it  is  appropriate  to  note  that  if  our  problem  Ax  = b is  a discretization  of  ( 1 ) 


then.  Theorem  1 tells  us  that  the  requirement  that  the  discreted  data. 


8n  ="  (gn'^Uj)  Uj 
1 


have  the  property  that  | g„^Uj  | <<  oj  for  j near  n,  is  not  totally  unreasonable  if  n is 


'large’.  Theorem  1 requires  that 


(g^Uj)2 


Obviously  the  sum  in  (11)  can  be  finite  only  if  the  individual  terms  in  the  sum  go  to  zero. 


(Note  that  in  (10)  we  have  used  Uj  to  denote  a vector,  and  in  (11)  to  denote  a function.) 


Thus,  chopping  makes  sense  in  this  setting,  although  for  a general  problem  (2)  there  is  no  a 


priori  reason  why  the  above  requirement  is  reasonable.  Varah  [8]  estimated  the  effect  of 


roundoff  error  on  the  SVDT  as  M/a^  where  M depends  on  the  machine  arithmetic  used  and 


in  (9)  oj  has  been  set  to  zero  for  j > k.  He  did  not  discuss  the  effect  of  noise  in  the  data. 


although  roundoff  is  frequently  given  a uniformly  distributed  random  model.  Voevodin  [20]. 


Suppose  we  are  given  b^  = b + e where  s is  randomly  generated  noise,  with  mean  0. 


For  large  n we  therefore  expect  e to  have  componentwise  alternating  signs.  What  is  the 


effect  of  e ? What  does  a SVDT  do  to  such  an  error?  Since  our  system  is  linear,  the  addition- 


al error  in  the  solution  of  (9)  due  to  e is 


I UM 


U^e 
I Ix|  I 


Is  (12)  small?  Clearly,  (12)  will'be  zero  only  if  e is  orthogonal  to  the  vectors  uj,  1 < j < k. 
For  (12)  to  be  small  we  must  have  e essentially  orthogonal  to  these  Uj.  Any  nonzero  projec- 
tion of  € onto  these  Uj  must  be  sufficiently  small  with  respect  to  the  corresponding  projec- 


V'* 


i 


f 

;■ 


I 


1 

I 


i 


[ 
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tion  of  the  true  data,  b,  onto  these  vectors.  If  we  recall  our  2 premises,  (1)  our  solution  is 
smooth  and  (2)  the  error  is  highly  oscillatory,  then  this  requirement  that  (12)  be  small 
translates  into  a requirement  that  the  Uj,  1 < j < k-1  do  not  pick  up  highly  oscillatory 
behavior. 

Consider  for  example  the  oscillatory  matrices  Gantmacher  [21]  or  at  least  matrices  with 
the  following  oscillation  property. 

Definition  1^.  A matrix  A with  eigenvalues  Aj  + j = l,...,n  has  the  oscillation  property  if  the 

following  is  true.  There  are  m < n distinct  eigenvalues  /xj  + j = 1 m,  and  there  is  a 

corresponding  complete  basis  of  eigenvectors  such  that  each  of  the  basis  vectors  corresponding 
to  Hj  has  exactly  (j-1)  variations  of  sign  in  its  coordinates. 

Definition  2.  A rectangular  matrix  A is  called  totally  nonnegative  (totally  positive)  if  all  its 
minors  of  any  order  are  nonnegative  (positive). 

Definition  3.  A matrix  A is  called  oscillatory  if  A is  totally  nonnegative,  A is  nonsingular, 
and  all  the  elements  in  the  principal  diagonal  and  the  first  superdiagonal  and  subdiagonal  are 
nonzero. 

Oscillatory  matrices  arise  in  the  study  of  small  vibrations  of  elastic  systems  Gantmacher 
and  ICrein  [22).  Oscillatory  matrices  have  the  oscillation  property. 


i Theorem  3 Gantmacher  and  Krein  [22].  An  oscillatory  matrix  A always  has  the 

i oscillation  property. 

1 


Observe  that  if  the  matrix  A in  (2)  has  the  oscillation  property,  then  SVDT  can  remove 
the  effects  of  highly  oscillatory  noise  whenever  the  desired  solution  is  smooth.  However,  if  A 
does  not  have  this  property  SVDT  may  have  little  effect  on  the  noise.  Before  presenting  an 
example  to  demonstrate  this  we  note  the  following.  In  Varah  [8],  he  considered  3 classical 

Lj 
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continuous,  ill-posed  problems  (1)  harmonic  continuation,  (2)  the  inversion  of  a LaPlace 
transform,  and  (3)  the  backwards  heat  equation.  It  is  interesting  to  note  that  although  he  did 
not  explicitly  mention  measurement  errors,  the  discrete  versions  of  these  3 problems  that  he 
used  in  his  computations  each  have  the  oscillation  property  stated  in  Definition  1.  In  fact  the 
matrices  used  for  the  LaPlace  transform  and  for  the  heat  equation  are  both  totally  positive  and 
hence  oscillatory.  Thus,  each  problem  in  Varah  [8]  was  expanded  in  a frequency  or  oscillation 
oriented  basis,  and  truncating  the  singular  value  decomposition  was  equivalent  to  removing  the 
high  frequency  components  from  the  data.  In  such  a situation  SVDT  achieves  our  stated 
objectives. 

Now,  however,  consider  the  following  discussion  that  shows,  SVDT,  may  not  mollify  the 
effects  of  noise  in  the  data.  We  consider  the  family  of  circulant  matrices  Gray  [23]. 

Definition  4.  An  nxn  matrix  A is  a right  circulant  if  and  only  if  j 

integers  0 < i.  j,  h < n-1  where  the  indices  are  computed  modulo  n. 

• 

For  example  the  4x4  matrix 

(a  b c d\ 

d a b c \ 

c d a b J 

bed  a/ 

is  a right  circulant.  A circulant  has  only  n independent  entries.  Each  successive  row  is  the 
successive  cyclic  permutation  of  the  elements  in  the  first  row  of  the  matrix.  The  sum,  product, 
transpose  and  inverse  of  a circulant  matrix  is  a circulant  matrix.  Observe  that  circulants  occur 
in  the  problem  of  analytic  continuation  Varah  [8].  Given  a harmonic  function  u(r,0)  in  the 
unit  circle  with  known  values  for  some  r < 1,  g(8)  = u(t,6),  find  its  values  f(8)  on  the  unit 
circle,  r = 1.  f and  g satisfy. 


1 A”  l-r2 

g«l)  = / ^f(0,)dO, 

2iry  Q 1 - 2r  cos(0-0i)  + r^ 


(13) 
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If  we  discretize  at  equally  spaced  points  using  the  trapezoidal  rule  and  the  fact  that  the 
kernel  and  the  solution  are  periodic,  we  obtain  a symmetric  circulant  equation  as  an  approxi- 
mation to  (13). 

The  fact  that  the  right  circulants  form  a commutative  ring  can  be  verified  directly.  Let  C 

be  a right  circulant  whose  first  row  is  (cq,  C] c„_i).  Let  rj  = exp(2irij/n)  where  i = , 0 

< j < n - 1.  The  rj  are  the  n,  nth  roots  of  unity.  Then  the  vectors  Vj  = wj/vn  where 

WjT  = (1,  rj,  rj2,  ....  rj"-')  , 0 < j < n - 1 (14) 

form  a unitary,  eigenvector  basis  for  C.  The  corresponding  eigenvalues  are 

\:  = 2 Ck  . (15) 

> k=0  “ J 

If  C is  symmetric,  then  we  must  have 
Ck  = Cn-k 

for  k = 1 (n/2)-l  if  n is  even,  and  for  k = 1,  ...,  [n/2]  if  n is  odd.  Note  that  C()  and, 

if  n is  even,  c„/2-  singletons.  In  the  symmetric  case  we  therefore  get 

^k  = ^n-k  ^ 

for  k = 1,  ...,  (n/2)-l  , if  n is  even,  and  for  k = 1,...,  [n/2]  if  n is  odd.  Note  that  \q  and.  if 
n is  even,  have  multiplicity  1. 

We  want  to  construct  an  ill-posed  symmetric  circulant  matrix  that  does  not  have  the 
oscillation  property  in  Definition  1.  Clearly  the  vectors  Wj  as  a function  of  increasing  j 
have  the  appropriate  oscillatory  behavior;  the  | Xj  | however  do  not  have  to  decrease  in  size  as 
j t.  To  verify  this,  consider  the  following  lemma. 


Proof;  To  verify  Lemma  2 we  first  note  from  (15)  - (17)  that  the  would  have  to  satisfy 
with  q = [n/2] 


q 

= Cq  + 2 2 C|j 

1 

q 2ffkj 

Xj  = Cq  + 2 2 C|j  cos . 1 < j < q (19) 

1 n 


Observe  that  the  jth  row  of  the  coefficient  matrix  of  (19),  if  we  let  our  unknown  vector 
be  (cq/2,  C|,  Cq),  is  the  function  cos  2irjt  evaluated  at  equally  spaced  points  in  the  interval 
[0,1].  But  we  know,  see  for  example  Bloomfield  [24]  these  functions  are  orthogonal  over 

equally  spaced  points.  Therefore,  (19)  has  a unique  solution  Cq c^  for  any  set  of  Xj.  Now 

define  C to  be  the  circulant  whose  first  row  is  Cg,  C|  c^,  c^,  Cq.|,...,  C|. 

Q.E.D. 


By  Lemma  2,  for  any  odd  n there  is  a symmetric  circulant  with  eigenvalues  Xq  = e 
’ ^(n/21  “ ^[n/2)+i  = other  Xj  > as  specified.  For  the  SVD  of  Cg  we  obtain  ctj 

a ff2  “ “ ^j(k)’  ®k  “ 2 n - 1.  and  = e . Using  SVDT,  and  setting  a„  = 0 , 

we  would  take  out  of  the  basis  the  vector  w,  = (1,1,  ..,  l)/'/n  . Obviously  this  would  have 
little  effect  on  the  error  since  it  would  not  affect  any  of  the  oscillatory  basis  vectors  wj,  j > 1. 

Thus,  an  ordering  of  the  Xj(C)  according  to  decreasing  magnitude,  does  not  necessarily 
correspond  to  increasingly  oscillatory  patterns  in  the  eigenvectors.  There  is,  in  general  no 
direct  link  between  a frequency  space  analysis  of  a given  problem  and  its  eigensystem.  The 


T 
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circulants  give  us  a concrete  family  in  which  we  can  readily  see  this  behavior.  We  could  of 
course,  construct  an  example  trivially.  For  if  A is  any  oscillation  matrix,  then  A”'  is  not. 
However,  we  introduced  the  circulants  because  they  give  us  an  interesting  class  of  matrices 
that  we  will  be  able  to  use  again  in  the  section  on  regularization.  Thus,  we  see  that  without  an 
oscillation  property  requirement  there  is  no  correlation  between  the  SVDT  and  the  removal  of 
highly  oscillatory  noise.  Since  roundoff  error  also  has  a random  element,  this  remark  also 
applies  to  that  kind  of  noise.  SVDT  yields  a numerically  well-conditioned  problem  and  this  is 
an  important  consideration.  We  can  also  argue  that  our  solution  should  not  have  any  signifi- 
cant part  on  the  ill-conditioned  subspace  that  has  been  projected  out.  However,  in  general 
(for  example,  for  systems  without  the  oscillation  property)  if  we  use  SVDT  the  error  in  the 
data  is  not  mollified  and  thus  appears  in  our  solution.  We  would  like  to  have  a procedure  that 
would  be  numerically  stable  and  that  also  would  remove  or  mollify  the  error. 


3.  A Hankel  Transformation  Procedure.  Ekstrom  and  Rhoads  [9]. 


First  we  restate  our  assumptions  and  our  objective.  We  are  assuming  the  desired  solution 
is  smooth.  We  are  also  assuming  that  our  data  g (or  b)  is  contaminated  by  a highly  oscillato- 
ry, but  small  amplitude  error.  We  want  to  replace  our  original  ill-posed  problem  (1)  or  (2)  by 
a better  conditioned  problem  that  mollifies  the  effect  of  the  noise  in  our  data  on  the  solution, 
and  that  gives  us  a physically  meaningful  solution  that  approximates  the  true  solution  in  some 
reasonable  sense.  As  was  shown  in  section  2,  the  SVDT  approach  is  equivalent  to  projecting 
the  data  onto  the  'significant’  part  of  the  space.  The  difficulty  as  demonstrated  in  section  2 is 
that  this  projection  may  not  mollify  the  effects  of  the  error,  unless  the  matrix  in  equation  (2) 
has  the  oscillation  property.  The  Ekstrom  and  Rhoads  approach  [9]  also  uses  a singular  value 
decomposition,  but  with  a weighting  system  instead  of  a truncation.  This  weighting  can 
effectively  mollify  the  effect  of  the  noise.  However,  by  example,  we  will  demonstrate  that  it 
can  also  mollify  the  desired  solution,  and  this  we  do  not  want  to  do. 


r'""’’"'-"— " "V  " • — ■ — ' ■ ‘ 
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The  discussion  in  Ekstrom  and  Rhoads  [9]  is  restricted  to  convolution  equations.  That  is 
in  (1) 


i ‘ 

k .■ 


r4  , ^ 


K(t,s)  = K(t-s)  (20) 

and  in  (2) 

Aij  = Ai.j.  (21) 

However,  we  will  argue  that  the  ideas  in  [9]  can  be  extended  to  general  matrices,  if  we  use  the 
singular  value  decomposition.  We  should  note  that  many  interesting  applications,  for  example, 
numerical  differentiation,  Cullum  [25],  Anderssen  and  Bloomfield  [12],  and  the  signal  process- 
ing of  radar,  Preiss  [26],  satisfy  (20)  or  (21).  Therefore,  even  if  we  restrict  the  discussion  to 
convolution  equations,  we  are  considering  a large  class  of  interesting  problems.  A matrix 
satisfying  (21)  is  called  a Toeplitz  matrix.  If  (20)  is  satisfied  and  we  discretize  (1)  appropri- 
ately (2)  will  be  a Toeplitz  system.  However,  one  should  note  that  preservation  of  the 
convolution  character  depends  upon  the  quadrature  formula  used. 

Definition  5.  A matrix  A is  persymmetric  if  it  is  symmetric  with  respect  to  the  secondary 
diagonal,  aj(n.j+n  .1  5 j £ n . Clearly,  a Toeplitz  matrix  is  persymmetric. 

Lemma  3.  If  Ax  » b is  a persymmetric  matrix  and  P is  the  matrix  of  all  zeros  except  for  I's 
on  the  secondary  diagonal,  then  the  matrix  H =«  AP  is  a symmetric  matrix.  Furthermore,  x = 
Py,  where  y is  the  solution  of  Hy  * b,  is  the  solution  of  Ax  = b. 

Proof.  Since  A is  persymmetric. 

^ij  = an-j+l.n-i+i  for  1 < i.  J < n . (22) 

Moreover,  since  H = AP,  hjj  = But,  (22)  implies  that  = aj(n.j^D  , the 

i.j  entry  in  H^.  Therefore,  H is  symmetric.  Furthermore,  if  we  set  x = Py,  then  Ax  = APy 


Hy  - b. 
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Q.E.D. 

Therefore,  we  can  transform  our  system  Ax  = b into  an  equivalent  symmetric  system  Hy  » b. 
Since  A is  Toeplitz,  H is  a Hankel  matrix. 

Definition  6.  A matrix  A is  an  nxn  Hankel  matrix  if  and  only  if  ajj  = hj^j,  0 < i,  j < n-1. 

In  a Hankel  matrix  the  elements  along  the  secondary  diagonal  are  all  equal,  as  well  as  all 
elements  on  diagonals  parallel  to  the  secondary  diagonal. 

Ekstrom  and  Rhoads  [9]  make  this  transformation  and  work  with  H.  Since  H is 
symmetric  it  has  a full  orthonormal  eigensystem  Vj,  1 < j < n.  They  assume  all  the  eigenva- 
lues Xj,  1 < j < n are  distinct.  One  can  write  the  solution  of  Hy^  = b + e as 

n 

y,  = 2 ^jVj  (23) 

1 

where 


Pi 


bTvj  e^^Vj 

i—  + !- 


1 S j $ n 


(24) 


The  error  in  (23)  due  to  c depends  upon  e^Vj  and  Xj  . To  mollify  this  error,  a weighting 
sequence  W(Xj)  is  introduced.  That  is,  they  replace  (23)  by 


n 

y(e.«)  »2  Wj^jVj  (25) 

1 


They  consider  several  choices  for  W , using  the  formula 


W. 


I Xj  I + 


(26) 


it'- 
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where  aj  is  to  be  specified.  For  example,  they  consider  setting  aj  equal  to  some  mulitple  of 
the  variation  in  the  components  of  the  individual  eigenvector,  Vj,  For  example. 


= a 2 (Vj'"*'*  - Vj‘)2 


aj  = a 2 (Vj'+*  - 2 Vj'  + Vj'-*)2. 

i-2 


J 


A comment  on  (27)  is  necessary.  These  are  the  relationships  stated  in  Ekstrom  and  Rhoads. 
Observe  that  the  first  variation  since  vj  is  normalized  can  be  at  most  4.  Therefore,  hidden  in 
the  choice  of  a are  the  effects  of  the  span  of  the  and  the  dimension  n,  and  we  cannot  say 
a is  always  small  or  large  since  its  size  will  depend  on  n and  the  scaling.  To  directly 
incorporate  the  effect  of  n,  we  write  (27)  so  that  the  expressions  look  more  like  difference 
quotients,  and  obtain 


^ i=l[  l/n  J 


and  similarly  for  the  second  variation.  Then  if  | Xj  | < n^,  (28)  can  yield  a significant 
damping  factor  with  a small  value  for  a.  When  we  discuss  regularization  we  will  be  consider- 
ing derivatives  and  small  values  of  the  regularization  parameter,  so  we  choose  to  use  (28)  in 
our  discussions  to  make  the  relationships  between  these  2 procedures  more  transparent. 
Ekstrom  and  Rhoads  [9]  also  mention  setting  aj  a a for  all  j,  this  clearly  would  not  connect 
the  weighting  to  any  oscillatory  behavior.  One  must  decide  how  to  choose  a.  Ekstrom  and 
Rhoads  give  the  following  heuristic.  They  observe  that  in  the  tests  they  ran,  the  residual  | | 
b + e - Ax(a)  | | decreased  initially  as  a was  reduced  and  then  increased  as  a was  further 
reduced.  Furthermore,  the  value  of  a for  which  the  residual  was  minimized,  was  a good 
approximation  to  the  value  of  a that  minimizes  the  error. 


Page  17 


The  Ekstrom-Rhoads  approach  is  interesting,  especially  if  one  has  a medium  size  problem 
that  is  to  be  solved  over  and  over  again.  In  such  a situation  the  eigenelements  can  be  deter- 
mined once,  stored  and  used  over  again  on  any  set  of  data.  For  very  large  problems  one 
would  not  want  to  have  to  compute  the  eigenelements.  but  this  is  also  true  of  a SVDT. 
However,  in  many  applications  the  number  of  data  points  is  reasonable.  Use  of  the  filter  (28) 
gives  us  a way  of  reducing  the  effects  of  the  highly  oscillatory  eigenvectors,  and  of  any  small 
eigenvalues  corresponding  to  smooth  eigenvectors.  Thus,  we  can  reduce  the  effect  of  the 
measurement  errors,  and  at  the  same  time  reduce  the  numerical  instabilities  due  to  the  very 
small  eigenvalues.  The  Ekstrom-Rhoads  approach  [9]  is  an  SVD  with  damping  of  the  effects 
of  oscillatory  eigenvectors  and  of  the  small  eigenvalues;  in  place  of  the  block  filter  of  the 
SVDT  which  may  have  no  direct  relationship  to  oscillatory  behavior.  This  damping  mimics 
regularization.  Thus,  this  approach  lies  somewhere  between  an  SVDT  and  a regularization. 
The  weighting  in  Tikhonov  regularization,  however,  has  a functional  analytic  interpretation; 
whereas  it  is  not  clear  what  a similar  interpretation  for  the  weights  in  (28)  would  be.  Observe 
in  particular  that  the  weights  Wj  as  a function  of  j need  not  be  a monotone  decreasing 
function  of  j and  in  general  may  have  any  shape.  By  example  we  will  show,  however,  that 
this  approach  does  not  necessarily  satisfy  our  objectives  either.  We  construct  a problem  where 
half  of  the  components  of  2 of  the  eigenvectors  are  oscillatory,  but  where  the  other  compo- 
nents are  not.  If  such  a frequency  split  occurs,  the  Ekstrom-Rhoads  approach  will  damp  out 
the  projection  of  the  solution  on  these  vectors  and  may  damp  out  a significant  part  of  the  true 
solution. 

Example  2.  Let  n = 2m.  Let  e be  the  m - vector  of  all  I's.  Let  h be  the  m - vector  (1. 
-1.1,  -1....).  Then  z,  = (e,  h)  and  = (e.  -h)  are  n - vectors  and  the  vectors  Vj  = Zj/vh  are 
orthonormal.  There  exists  a complete  orthonormal  basis  for  n - space,  V = iV|.  v,....}.  Let 
> \2  > •••  > \„  > 0 be  any  set  of  eigenvalues  and  set  A — V.W^  where  .V  is  the  diagonal 
matrix  whose  jth  diagonal  entry  is  Xj. 
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We  want  to  solve  Ax  — b + e where 
b s A(e,0)  and  e 31  (0,5h).  Clearly  the  solution  for  £ 0,  equals 

•/fi 

y = - — (v,  + V2  ) 

and  the  error  in  the  data, 

vH 

€ = — (S(V,  - Vj). 

Therefore,  for  example  with  \|  = 1/10,  X2  = 1/20,  the  computed  solution  is 
y^  = j^W,(l  + 10  5)v,  + W2(l-20a)v2]. 

I 

The  corresponding  error  in  the  solution  is 

y^  - y = [((W,-l)  + 10  W,5)v,  + ((W2  - 1)  - 20W2<5)v2j.  (30) 

The  Ekstrom-Rhoads  procedure  chooses  the  weights  W|  by  (28)  measuring  the  oscillation  in 
V|  and  V2-  Using  the  first  variation,  we  obtain  «|  » 2an(n-2)  and  02  = 2 a n^.  This 
choice  yields  damping  or  small  W^.  However,  we  see  in  (30)  that  to  get  an  accurate  solution 
we  need  each  Wj  ~ 1. 

The  problem  in  Example  2 is  the  mixing  of  the  frequencies.  The  eigenvectors  do  not 
necessarily  yield  a separation  of  the  frequencies.  We  note  that  if  A is  an  oscillation  matrix, 
then  the  Ekstrom-Rhoads  approach  is  a generalization  of  SVDT  to  include  the  effects  of 
oscillatory  behavior.  How  much  of  a generalization  it  is  depends  upon  the  particular  problem 
since  for  any  problem  (2)  obtained  from  (1),  we  expect  the  Xj"  + 0 as  n + . 
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Observe  that  unlike  the  SVDT,  this  approach  cannot  be  interpreted  as  a modification  of 
the  data;  it  is  instead  a modification  of  the  operator. 

H -♦  H = V 2 VT 

where  2 is  the  diagonal  matrix  whose  jth  diagonal  entry  is  (\j  + aj). 

To  extend  these  arguments  to  systems  that  are  not  persymmetric,  consider  the  following. 
Let  A = U 2 be  a singular  value  decomposition  of  any  matrix  A.  Thus,  V is  a system 
of  eigenvectors  for  A^A  and  U a set  for  AA^.  Our  solution  of  Ax  = b is  X = V2'‘  U^b. 
If  we  know  x is  not  oscillatory  we  can  damp  out  those  vj  and  Uj  with  highly  oscillatory 
behavior.  We  take 


X = 2 


(b^Uj)  Vj 


(31) 


with  a weight  that  measures  any  oscillation  in  Uj  or  vj'.  For  example  let 


= an^  2 F(Uj'‘''''  - Uj'^)  “ + (Vj*‘+’  - Vj'‘)  -1 

k=0 


(32) 


Using  (32)  we  can  apply  the  Ekstrom- Rhoads  approach  to  any  problem. 


At  the  beginning  of  this  paper  we  claimed  that  regularization  is  the  natural  setting  for 
handling  noisy  information,  because  it  has  a direct  frequency  interpretation,  in  the  next  section 
we  will  justify  this  comment.  The  justification  presented  is  valid  only  for  convolution  equa- 
tions. The  prototype  convolution  equation,  which  has  numerous  applications,  is  the  numerical 
differentiation  of  tabulated  data,  and  many  of  the  explanatory  comments  will  refer  to  this 


problem. 
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4.  Tikhonov  Regularization.  [10],  [28]. 


To  discuss  the  regularization  of  equation  (1)  or  (2),  we  restrict  ourselves  to  convolution 
equations  K(t,s)  = K(t-s)  or  Toeplitz  systems,  A = (ajj)  = (aj.j).  Thus,  system  (1)  becomes 


n 


g(t)  =/  K(t-s)  f(s)ds 
0 


0 < t < 1 


(33) 


We  make  this  restriction  because  we  want  to  introduce  Fourier  transforms.  However,  Tikho- 
nov regularization  is  applicable  to  general  equations  (1). 

The  Fourier  transform  of  an  Lt(-oo,»)  function  h is  defined  as 


'li( 


.).r 

-01 


e‘"‘  h(t)  dt 


(34) 


In  practice,  as  in  our  equation  (1),  we  often  have  a finite,  not  infinite  interval  of  integra- 
tion,  and  in  addition,  in  our  numerical  work  we  deal  with  sampled  information.  We  will  try  to 
incorporate  these  effects  into  our  remarks,  however,  we  will  not  present  a totally  rigorous 
analysis. 

As  stated  in  the  previous  section  an  example  of  (33)  is  the  computation  of  the  derivative 
of  a function.  Given  g find  f where 


h(t-s)  f(s)  c's 


(35) 


and  h(t-s)  = 0 for  t < s,  and  1 for  t > s. 


We  denote  any  equation  of  type  (1)  by  Kf  = g.  The  regularization  procedure  replaces 
(1)  by  a one-parameter  family  of  minimization  problems  which  we  denote  by  P(a).  a > 0. 
For  a given  a > 0,  we  define  P(a)  as 

minimize  [ | | Kf  - g | 1 ^ + a J2(f)l  (36) 

f 

where  | | • | | - is  some  norm  measure  of  the  error  (Kf  - g)  in  f being  a solution  of  (1), 
and  i2(»)  is  some  norm,  usually  different  from  | 1 • 1 1 which  controls  the  smoothness  of  f. 
For  example. 


Q(f) 


•/  n n 


(37) 


where  f’”  denotes  the  derivative  of  f. 


Definition  6.  The  family.  P(a)  , a > 0 is  a regularizing  family  for  Kf  = g if  (1)  as  a + 0,  the 
solution  f„  of  P(a)  converges  in  some  reasonable  sense  to  the  solution  of  Kf  = g,  and  (2) 
each  problem  P(a)  is  well-posed. 

Definition  7.  A problem  is  well-posed  if  it  has  a unique  solution  and  this  solution  depends 
continuously  on  the  data. 

Definition  7 is  the  Hadamard  defintion  of  well-posedness.  However,  we  should  note  that 
a problem  can  be  well-posed  but  ill-conditioned.  For  example  every  system  (2)  is  well-posed, 
since  we  assumed  A is  invertible.  However,  depending  upon  the  span  of  the  singular  values, 
this  system  can  be  arbitrarily  ill-conditioned.  In  numerical  work  we  want  well-conditioned,  not 
just  well-posed  problems  and  as  we  look  at  regularization  we  will  try  to  see  if  the  approximate 
problems  it  generates  are  in  fact  well-conditioned. 

In  practice  our  problems  contain  measurement  errors.  When  there  is  error,  we  cannot  let 
the  regularization  parameter  a become  too  small.  Note  that  even  without  such  measurement 
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errors,  there  are  errors  due  to  the  quadrature  approximations  and  due  to  roundoff.  These 
errors  also  limit  the  size  of  a.  In  practice  we  find  that  the  error  | | f j^ue  * I I * reduces 
initially  as  a is  reduced,  reaches  a minimum  which  depends  on  the  problem  and  the  type  of 
error  present  and  then  increases  again  as  we  continue  to  reduce  a.  Procedures  for  determining 
an  appropriate  value  of  a are  given  in  Wahba  [27],  Turchin,  Kozlov  and  Malkevich  [28], 
Gordonova  and  Morozov  [29], 


One  of  our  stated  objectives  is  to  replace  our  original  problem  by  a better  conditioned 
one.  We  must  determine  the  'condition  of  P(o)'.  The  Euler  necessary  conditions  of  optimality 
for  P(a)  are 


(K*K  + aB)  f = K*g  and  f"’(0)  = f"’(l)  = 0 


(38) 


where  B corresponds  to  Equation  (38)  is  simply  the  statement  that  at  an  optimum  point 
f of  P(a),  the  first  variation  of  (36)  must  vanish.  If  we  have  equations  (35),  and  (37)  the 
problem  of  differentiation,  equation  (38)  is  the  integro-differential  equation 


f(u)  du  + af  - = 


g(u)  du  , f"’(0)  = f*'*(l)  = 0 


(39) 


In  Cullum  [25]  an  additional  term  f was  added  to  the  left-hand  side  of  (39).  This 


corresponds  to  addingl 


ing(/o'  f)^ 


into  the  expression  in  (36).  To  solve  (36),  we  solve  (39). 


Using  a quadrature  rule  and  discretizing  the  derivative  in  (39),  we  obtain  an  algebraic  system 


(H  a B)  X = b (40) 

How  does  the  addition  of  the  operator  aB  affect  the  condition  of  K*K  or  H?  The  existing 
regularization  results  Anderssen  and  Bloomfield  [30],  Hilgers  [31]  indicate  that  in  many  cases 
the  best  choice  of  a is  a small  number,  for  example  10'*^  to  lo  '‘.  How  can  such  a small 
number  have  a significant  effect  on  the  condition  of  our  problem  K*Kf  = K*g  ? 


r 


r 


I 


hi  ■ 


1 


I 

1 

1 


I 
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Clearly,  if  S2(f)  =J^'  so  that  B = I,  the  smallest  eigenvalue  is  a and  the  condition 
of  (38)  is  essentially  l/.a.  So,  if  a is  very  small,  we  have  not  done  much  to  stabilize  our 
numerical  computations.  However,  if  we  take  Bf  = f-f*2>,  then  we  can  argue  that  we  may 
have  significantly  affected  the  condition.  As  is  also  noted  in  Lee  et  al  [11],  where  a very 
different  type  of  regularization  analysis  is  presented,  it  is  important  that  the  operator  B in 
(38)  be  an  unbounded  operator  whose  eigenvalues  have  a limit  point  at  infinity.  The  ill- 
posedness  of  (1)  is  due  to  the  compactness  of  the  operator  K;  compactness  implies  that  the 
singular  values  of  K have  a limit  point  at  0.  We  must  add  on  an  operator  that  can  counter- 
act this  decay.  To  achieve  some  feeling  for  condition,  let  us  consider  the  circulant  matrices 
again.  Note  that  any  circulant  is  also  Toeplitz.  If  the  first  row  of  our  circulant  C is  c = c^, 
....  c„.|  , then  we  recall  the  following  interesting  fact  that  the  jth  component  of  the  finite 
Fourier  transform  of  c . 0 < j < n - 1, 


That  is  the  Fourier  coefficients  are  just  the  eigenvalues  of  C.  In  (41)  Wj  is  the  jth  eigenvec- 
tor of  C given  in  (14). 


La 
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For  B = f - and  f(l)  = f(n),  a discretized  version  of  B is  the  symmetric  circulant 


-1 

0 

0 

-1 

2 + h2 

-1 

0 

0 

-1 

2+h- 

-1 

0 

0 

-1 

2-l-h2 

-1 

0 

0 

-1 

2+h^. 

where  h = Atj  is  fixed. 

Observe  that  B is  diagonally  dominant,  therefore  all  the  eigenvalues  are  positive  in  fact 
2 

Aj  = (1- cos(2ir  j/n))  + 1 (43) 

If  h = 1/n,  then  \j  = O(n-),  and  \„.j  = Xj  1 < j < n/2  (if  n is  even). 

Before  proceeding  we  state  the  following  theorem  which  is  proved  in  Cullum  [32]. 

Theorem  4.  Let  K be  a positive  definite,  symmetric  operator,  then  the  family  of 
equations,  a > 0, 


(K  + aB)f  = g.  f"*(0)  = f"’(l)  = 0 


where  B is  obtained  from  a norm  Q as  in  (38)  is  a regularizing  family  for  (1). 

Theorem  4 allows  us  to  use  (44)  in  place  of  (38).  Let  C be  a symmetric,  positive 

definite,  but  ill-posed  circulant  obtained  from  (1)  whose  first  row  is  c^.C] Cp.i,  and  consider 

the  equations 


(C  + oB)x  = b 

Since  the  matrix  D = C -I-  aB  is  symmetric  and  positive  definite,  the  condition  of  D is  just 
the  ratio  max  Xj(D)/minXj(D).  All  1-circulants  have  the  same  family  of  eigenvectors  so  we 
have  Xj(D)  = Xj(C)  + a Xj(B).  We  know  cs  the  discretization  is  refined,  that  min  Xj(C)  + 0 
and  max  Aj(B)  f x.  The  effect  of  aB  can  now  be  explained  by  giving  an  example. 
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Example  3.  Let  Aj(C)  = 1/j'’  and  Xj(B)  = for  some  q > 0,  s > 0 and  j = 1 n/2. 

Note  that  “ ^j(D),  since  D is  symmetric.  Then  the  minimum  of  Xj(D)  for  a given 


value  of  a is  not  less  than  the  minimum  of 


<^(w)  = — + o!  > 0 . 


But,  this  minimum  occurs  when  = [q/as]  , and  the  corresponding  minimum  value  of 
(45)  is  of  the  form 


For  example  if  s = 2 which  would  correspond  to  (42)  and  q = 2,  that  is  the  eigenvalues  of  C 
decay  like  *'^n^,  then 


X^i„  (D)  > 2 v'a  . 


Moreover,  let  c = 2 1 Cj  | , then 
0 

^max  (D)  < 4 a n2  + ? . 


Therefore,  the  condition  of  D . to  be  specific  take  s=  2, 


- < 2W+  — 

(D)  2v'a 


If  a = 10  and  n < lO"*,  this  condition  is  < 10^. 


From  (46)  we  can  see  the  effect  on  conditioning  of  increasing  the  order  of  the  smoothing 
norm,  i.e.  increasing  s,  or  of  increasing  the  rate  of  decay  of  the  eigenvalues  of  the  operator  C, 
i.e.  increasing  q.  If  we  keep  the  order  of  B,  s,  fixed  and  increase  the  rate  of  decay  q,  we  see 
that  the  condition  of  (44)  deteriorates  towards  1/nt  as  q + *.  Similarly,  if  we  fix  the  rate  of 


Page  26 


decay  q and  increase  the  degree  of  smoothing  s we  improve  the  condit  nning.  These 
comments  are  for  a fixed.  However,  as  we  vary  s or  q we  do  not  expect  the  changes  in  a 
to  be  enough  to  compensate  for  the  changes  in  (46)  so  that  there  is  a significant  net  gain  in 
practice. 

The  problem  of  analytic  continuation  given  in  section  2,  see  equation  (13)  is  an  interest- 
ing example  of  type  (44).  The  appropriate  discretization  yields  a symmetric  circulant  matrix 

whose  eigenvalues  are  \j  = (ri  + r"-j)/(l  - r")  j = 0 n/2  . Since,  r < 1,  for  large  n,  Aj 

is  of  the  order  t>  j = 0.„„  n/2.  (Assume  n is  even).  See  Varah  [8]  for  details.  If  we 
regularize  this  problem  using  (42)  the  eigenvalues  of  the  regularized  matrix  are  for  a given 
r < 1 of  the  order, 

Aj(0)  ~ H aj2. 

As  before  min  Xj(D)  is  not  smaller  than  the  minimum  of 
<>(to)  = r“  + a(fa>“  + 1)  . 

<t>  is  convex  with  a unique  minimum  u(a)  satisfying 
2au 

e-a«  = (52) 

a 

where  a = - In  r.  Taking  logarithms  of  both  sides  and  assuming  that  aw  >>  In  w,  we  obtain 
the  estimate  w(a)  ~ In  a / In  r.  Using  this  estimate  we  estimate  the  condition  of  (44)  as 
~(o(ln  a)*)  * . For  a = 10  this  quantity  is  like  10^.  Therefore,  the  exponential  decay  in 

C adversely  affects  the  conditioning  and  as  is  discussed  in  Cullum  132)  the  corresponding 
goodness  of  the  regularization  approximation. 

We  now  consider  the  more  general  case  of  Toeplitz  matrices  and  convolution  operators. 
We  note  again  that  starting  with  (33),  depending  on  the  problem  and  the  quadrature  rule  used. 


(50) 


(51) 
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we  may  or  may  not  generate  a Toeplitz  approximation.  In  particular  if  K is  Toeplitz  so  is  K‘, 
however,  K*K  need  not  be  Toeplitz,  so  we  have  to  be  careful. 

As  described  above,  for  any  circulant  matrix  the  coefficients  of  the  associated  finite 
Fourier  transform  and  the  eigenvalues  of  the  matrix  are  identical,  so  that  we  can  discuss  the 
condition  of  the  circulant  by  using  the  maximum  and  minimum  of  the  Fourier  transform 
coefficients  or  of  the  eigenvalues. 

Symmetric  Toeplitz  matrices  and  circulant  matrices  are  strongly  related  Gray  [23],  and  for 
such  matrices  we  have  the  following  generalization  of  the  relationship  between  the  Fourier 
transform  of  the  Toeplitz  matrix  and  the  eigenvalues  of  that  matirx. 

Theorem  5.  Widom  [33]  Let  T^  =/'Co  c,...  c„.')y 


be  a symmetric  Toeplitz  matrix.  We  assume  that  the  c„  are  the  Fourier  coefficients  of  an  L-,  - 
function  <j>.  Let  m = ess,  inf  <#>  and  M = ess.  sup.  ^ on  the  real  line. 

Then  all  the  eigenvalues  of  T„  satisfy 

m < Xj(Tn)  < M . (53) 

This  subject  matter  is  discussed  in  detail  in  Grenander  and  Szego  [34|.  In  our  case  M =*  + * 
so  we  also  need  the  following  theorem. 

Theorem  6 Grenander  and  Szego  (34).  If  A is  any  Hermitian  matrix  for  which  the  sum  of 
the  moduli  of  the  elements  in  each  row  have  a common  bound. 
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Then 


I Xj(A)  I < M 1 S j < n. 


Theorem  6 is  simply  a statement  about  matrix  norms,  see  Wilkinson  [35],  for  example. 


It  should  be  emphasized  that  Theorem  5 is  not  directly  applicable  to  our  problem  because 
it  applies  to  finite  Toeplitz  matrices  that  are  sections  of  infinite  matrices.  As  we  increase  n 
in  our  discrete  approximations  to  (33)  we  are  not  simply  constructing  higher  dimensional 
sections  of  one  infinite  dimensional  Toeplitz  matrix,  so  the  following  discussion  is  somewhat 
heuristic.  Thus,  we  will  consider  the  Fourier  transform  of  our  operator  K*K  + aB  and 
replace  the  normal  definition  of  conditioning  of  a system  of  linear  equations  by  the  ratio  of  the 
maximum  of  the  Fourier  transform  of  our  operator  to  its  minimum.  This  is  sometimes  called 
the  dynamic  range,  Makhoul  [36]. 

We  want  to  solve  (33);  we  assume  K is  symmetric  and  positive  definite.  The  backward 
heat  equation  is  an  example  of  such  a problem,  Varah  [8].  We  regularize  (33)  using  Theorem 
4 and  take  the  Fourier  transform  of  equation  (44),  obtaining 

A A A A 

(K  + a B)  f =2  (54) 

where  A denotes  Fourier  transform.  We  have  neglected  the  effect  of  the  finite  interval.  We 
are  in  fact  acting  as  though  our  equation  were 


K(t-s)  f(s)  ds  = g(t) 


(55) 


If  we  apply  Theorem  5 to  our  problem  (heuristically)  having  first  used  the  simplest 
quadrature  rule,  namely  rectangles,  then  the  Fourier  transform  of  the  operator  K corresponds 
to  the  function  <f>.  Thus,  we  argue  that  the  maximum  and  minimum  of  this  Fourier  transform 
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of  K give  us  an  estimate  of  the  condition  of  the  discrete  Toeplitz  systems  (2).  For  example 
for  numerical  differentiation  taking  Fourier  transforms  of  (38)  we  get  (54)  equals 

1 - A 1 A 

— + a(u2  + 1)  f = _'g'  . 

As  obtained  in  (46),  the  minimum  of  this  transform  is  2v^.  Clearly,  it  has  no  upper  bound. 
However,  in  practice  the  number  of  data  points  n is  bounded  for  example  n < lO"*  and  we 
can  use  Theorem  6 to  get  an  upper  bound  on  the  eigenvalues.  Let  k be  an  upper  bound  on 
K then 


M < an^  8 + k 


(57) 


Therefore,  the  condition  number  of  the  matrix  D generated  from  (35)  satisfies 


^max  (D) 
^min  (D) 


< n^  + — 
/a 


(58) 


If  a = 10"*  and  n = 10^  then  this  becomes 


^max  (D) 

(D) 


< 8 + 10-*k 


(59) 


A comment  about  the  preceeding  arguments  is  necessary.  In  practice  see  Cullum  [25) 
Hilgers  [31],  it  was  proposed  that  one  invert  the  B operator  and  obtain  a pure  integral 
equation.  Cullum  [25]  performed  a nonsymmetric  inversion,  Hilgers  [31]  a symmetric  one. 
This  eliminates  the  need  to  approximate  derivatives  of  f in  Bf  by  differences.  This  conver- 
sion. however,  yields  an  equation  whose  condition  behaves  like  1/a,  and  we  are  back  to  the 
same  effect  on  the  conditioning  as  using  B = 1,  except  that  the  conversion  has  simultaneously 
smoothed  the  data  and  the  equations.  In  many  cases  this  smoothing  effectively  removes  the 
noise,  and  the  smoothing  has  been  done  in  a natural  way.  Lee  et  al  [11]  and  Hilgers  [31] 
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mention  the  question  of  whether  to  discretize  (38)  directly  or  to  invert  B first.  However,  to 
date,  there  is  no  published  analysis  of  this  question. 


Now  let  us  use  Fourier  transforms  to  examine  what  regularization  is  doing  to  any  random 
noise  in  the  data.  Anderssen  and  Bloomfield  [12]  did  this  for  numerical  differentiation,  using 
the  2nd  derivative  of  equation  (38).  To  solve  (38),  again  ignore  the  fact  that  we  are  really  on 
a finite  interval,  take  Fourier  transforms  and  obtain  (54). 

Then 


f 


+ a B 


(60) 


We  can  deconvolve  (60)  only  if  the  right-hand  side  is  in  L,.  For  the  numerical  differentiation 
formula.  K is  not  symmetric  and  we  have  to  use  equation  (38)  which  becomes  equation  (56) 


+ a(u^  + 1) 


Note  that  in  Cullum  [25]|^’  fj^  was  added  to  the  P(a)  minimization,  one  can  show  that  this 
improves  the  conditioning,  but  it  cannot  be  incorporated  into  the  Fourier  transform  frame- 
work. In  this  case 


-i«  ^ 


I + a 


(61) 


For  small  a and  u,  ^ in  (61)  behaves  like  for  u large  like  ^ /au^,  with  a smooth 
transition  in  between.  Therefore,  ^ e L,  if  ^ e L,,  and  deconvolution  is  legitimate.  In 
particular  if  ^ ^ then,  the  error  in  ^ for  large  w 


0 (V/,) 


(62) 


Thus,  we  have  an  explicit  means  of  decreasing  the  effects  of  highly  oscillatory  error  in  the 


data. 
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If  random  noise  is  present  in  the  measurements  so  that  we  have  g £ instead  of 

g,  then  gg  may  have  a spiky  pattern  and  these  spikes  may  appear  in  g^  as  peaks.  But  the 
spikes  in  gg  should  by  assumption  correspond  to  high  frequencies  o)  and  at  high  frequencies 
gg  is  modulated  by  One  could  get  higher  modulation  using  higher  derivatives. 

A A 

Thus,  the  purpose  of  aB  is  to  counteract  the  decay  in  K*K(u))  and  make  the  deconvolu- 
tion legitimate.  If  B is  symmetric  Toeplitz  and  given  for  example  by  (42),  we  know  its 
eigenvalues  + * as  n + =o,  so  §(<*>)  ♦ * as  a)  + =o.  Therefore,  we  have  constructed  a smooth 
filter  on  the  transform  of  the  given  data  that  decays  monotonically  to  zero  as  w + ».  How 
quickly  it  decays  depends  upon  a and  the  order  of  B.  This  filter  has  the  functional  analytic 
interpretation  given  in  (36).  Regularization  is  a modification  of  our  given  operator,  not  just  a 
projection  or  modification  of  the  data.  For  numerical  differentiation,  if  g = 2 a„  cosnirt,  then 
as  Anderssen  and  Bloomfield  [I2l  showed  the  nth  Fourier  coefficient  of  the  solution  f, 

gn  . ' (63) 


1 + a (nir)^  + alnir)"* 


In  numerical  work  we  have  to  use  the  finite  Fourier  transform  and  we  have  to  understand 
the  effects  of  having  only  a finite  number  of  data  points  from  a finite  interval  instead  of  the 
continuous  values  over  the  whole  real  line.  The  finite  interval  corresponds  to  multiplying  a 
function  defined  on  the  whole  interval  by  a unit  pulse;  the  finite  number  of  samples  introduces 
an  effect  called  aliasing,  when  the  effects  at  many  frequencies  get  mapped  onto  the  same 
frequency.  See  Bloomfield  [24]  for  details.  Therefore,  we  have  to  assume  that  we  have 
sampled  at  a high  enough  sampling  rate  to  pick  up  the  maximal  frequency  in  the  noise.  This  is 
typically  feasible  since  most  instruments  are  band-limited  so  that  the  noise  in  the  data  is  also 
band-limited.  Otherwise,  a portion  of  the  noise  could  be  mapped  back  into  the  lower  frequen- 
cies which  should  represent  the  solution. 
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Let  us  summarize  the  above  comments.  If  the  error  is  directly  related  to  frequency  or 
oscillation,  than  a natural  basis  for  our  space  is  the  Fourier  exponential  functions  since  these 
are  parameterized  in  the  frequency.  A smooth  weighting  of  the  coefficients  in  the  expansion 
of  given  functions  in  this  basis  corresponds  to  damping  any  highly  oscillatory  error  and  not  to 
damping  the  smooth  solution. 

Regularization,  however,  has  its  own  problems,  of  a different  nature  than  the  SVDT.  One 
of  these  is  discussed  in  Cullum  [32]  where  we  consider  the  choice  of  B and  argue  that  this 
choice  is  important  because  it  affects  the  condition  and  the  rate  of  convergence  of  the  solution 
of  the  P(a)  problem  to  the  solution  of  the  original  problem.  If  there  is  noise  in  our  data,  we 
know  we  cannot  make  a arbitrarily  small.  Franklin  [16]  discusses  rates  of  convergence  but 
doesn't  explain  what  is  causing  this  bad  behavior.  One  of  the  examples  is  the  backward  heat 
equation  which  Varah  [8]  solved  using  the  SVDT.  We  noted  earlier  that  for  this  problem  the 


eigenfunctions  have  the  oscillation  property  so  there  is  a direct  connection  between  truncating 
the  eigenvector  expansion  and  in  mollifying  the  noise.  Therefore,  we  would  expect  SVDT  to 


work  better  on  this  problem  than  regularization. 


5.  Summar 


Three  procedures  SVDT,  Hanson  [7],  the  Ekstrom-Rhoads,  Hankel  matrix  procedure  [9], 
and  Tikhonov  regularization  [10]  have  been  described  and  their  treatment  of  highly  oscillatory 
noise  in  the  data  described.  We  have  seen  that  SVDT  uses  the  singular  vectors  of  the  operator 
and  is  equivalent  to  projecting  the  given  data  onto  a subspace  of  the  singular  vectors. 


It  is  numerically  stable,  but  requires  the  amplified  projections 


•>1 

' to  decay 


to  zero  rapidly,  and  except  when  the  operator  involved  has  the  oscillation  property  gives  no 


explicit  means  of  mollifying  the  effects  of  the  errors  in  the  data.  This  latter  property  was 


demonstrated  by  an  example. 
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We  have  also  seen  that  the  Ekstrom  and  Rhoads  procedure  [9)  which  they  discuss  only  for 
convolution  equations,  but  which  could  be  extended  to  general  systems  using  the  SVD  and  a 
weighting  which  measures  variation  in  either  the  eigenvectors  of  K*K  or  the  eigenvectors  of 
KK*  , is  also  a SVD  approach.  The  original  system  is  replaced  by  an  equivalent  symmetric 
system  and  an  expansion  made  in  the  eigenvectors  of  this  system.  The  weighting  they 
introduce  will  damp  components  corresponding  to  highly  oscillatory  eigenvectors  and  damp  the 
effects  of  small  eigenvalues  thereby  making  the  procedure  numerically  stable.  However,  as  we 
demonstrated  by  example  the  damping  proposed  may  also  damp  the  desired  solution  due  to  the 
mixing  of  frequencies  which  can  occur  when  one  uses  an  eigenvector  basis.  Moreover,  the 
weightings  that  they  introduce  do  not  seem  to  have  functional  analytic  interpretations. 

The  Tikhonov  regularization  procedure  was  also  discussed  for  convolution  equations,  and 
Fourier  transforms  were  introduced.  We  used  the  Fourier  transform  to  estimate  the  condition 
of  the  regularization  approximations.  Using  numerical  differentiation  as  an  example  we 
showed  (as  in  Bloomfield  and  Anderssen  [12])  that  there  exists  a direct  relationship  between 
the  damping  associated  with  the  regularization  parameter  a and  the  elimination  of  highly 
oscillatory  noise.  We  also  noted,  however,  that  regularization  is  not  necessarily  always  the 
method  to  use,  its  usefulness  depends  upon  the  rate  of  convergence  of  the  regularization 
approximations.  This  is  discussed  in  more  detail  in  Cullum  [32]. 
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singular  v.ilue  deco  nposiiton  with  truncation,  (SV’DI  ) a llankcl  transformation  with  damping, 
and  the  Tikhonov  rceiilari/ation  proeedure.  on  such  noise  in  the  data  We  ilcmonstratc  that  in 
general,  rcgularir.ition  is  the  most  natural  setting  for  mollifying  the  effects  of  such  noise. 
UowcNer,  for  certain  problems  SVDf  is  equally  suitable  and  in  fact  may  be  better  if  the  rate 
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